Linguistic Features for Quality Estimation
نویسندگان
چکیده
This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with EnglishSpanish translations show that linguistic features, although informative on their own, are not yet able to outperform shallower features based on statistics from the input text, its translation and additional corpora. However, further analysis suggests that linguistic information is actually useful but needs to be carefully combined with other features in order to produce better results.
منابع مشابه
Linguistic Indicators for Quality Estimation of Machine Translations
This work presents a study of linguistically-informed features for the automatic quality estimation of machine translations. In particular, we address the problem of estimating quality when no reference translations are available, as this is the most common case in real world situations. Unlike previous attempts that make use of internal information from translation systems or rely on purely sh...
متن کاملQuality estimation for Machine Translation output using linguistic analysis and decoding features
We describe a submission to the WMT12 Quality Estimation task, including an extensive Machine Learning experimentation. Data were augmented with features from linguistic analysis and statistical features from the SMT search graph. Several Feature Selection algorithms were employed. The Quality Estimation problem was addressed both as a regression task and as a discretised classification task, b...
متن کاملError Detection for Statistical Machine Translation Using Linguistic Features
Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from N best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation syste...
متن کاملA CCG-based Quality Estimation Metric for Statistical Machine Translation
We describe a metric for estimating the quality of Statistical Machine Translation (SMT) output based on syntactic features extracted using Combinatory Categorial Grammar (CCG). CCG has been demonstrated to be better suited to deal with SMT texts than context free phrase structure grammar formalisms. We use CCG features to estimate the grammaticality of the translations by dividing them into ma...
متن کاملThe Effect of Genre Awareness on English Translation Quality and Pedagogy: A Case of News Reports Translation as an Academic Curriculum
To produce an adequate translation, language students are required to learn varieties of language features including syntax, semantics and pragmatics. Considering the curriculum language learners are face with, one can claim that almost all language students in Iran are taught these features in their academic settings including linguistic courses. Yet, there are some aspects of language which a...
متن کامل